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Abstract 

Commercial  off-the-shelf  (COTS)  component  usage  is  becoming  more  preva¬ 
lent  in  military  applications  due  to  current  Department  of  Defense  (DoD)  policies. 
The  easy  accessibility  of  COTS  will  give  reverse  engineers  a  higher  probability  of 
successfully  tampering,  coping,  or  reverse  engineering  circuits  that  contain  critical 
capabilities.  To  prevent  this  and  verify  the  trustworthiness  of  hardware,  circuit  iden¬ 
tification  tags  or  serials  numbers  can  be  used.  However,  these  values  can  be  easily 
obtained  and  forged.  To  protect  critical  DoD  technologies  from  possible  exploitation, 
there  is  an  urgent  need  for  a  reliable  method  to  confirm  a  circuit’s  identity  using  a 
set  of  unique  unforgettable  metrics. 

This  research  proposes  the  concept  of  creating  a  circuit  identifier,  or  digital 
fingerprint,  for  application  specific  integrated  circuits  (ASIC)  and  field  programmable 
gate  arrays  (FPGAs).  The  digital  fingerprint  would  be  a  function  of  the  natural 
variations  in  the  semiconductor  manufacturing  process  and  the  functionality  of  the 
circuit  allowing  the  creation  of  a  unique  identifier  for  a  specific  chip  that  can  not  be 
duplicated  or  forged. 

The  proposed  digital  fingerprint  allows  the  use  of  any  arbitrary  node  or  set  of 
nodes  internal  to  the  circuit  and  the  circuit  outputs  as  monitoring  locations.  Changes 
in  the  signal  on  a  selected  node  or  output  can  be  quantified  digitally  over  a  period 
of  time  or  at  a  specific  instance  of  time.  Two  monitoring  methods  are  proposed,  one 
using  cumulative  observation  of  the  nodes  and  the  other  samples  the  nodes  based  on 
a  signal  transition. 

Testing  of  the  two  monitoring  methods  was  performed  on  a  small  sample  of 
twenty  Xilinx®  Virtex-II  Pro  FPGAs.  Both  methods  successfully  created  unique 
identifiers  for  each  FPGA. 
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Digital  Fingerprinting 


of 

Field  Programmable  Gate  Arrays 

I.  Introduction 

Wa never  changes' 0ne  of  war’s  maior  tacets  is  the  dash  between  the  ted> 

▼  V  nology  of  opposed  civilizations.  From  copper  sword  bearing  tribes  that  were 
destroyed  by  those  equipped  with  iron  to  the  French  armies  that  suffered  greatly  un¬ 
der  English  longbowmen,  technological  advancement  over  ones  enemies  is  a  critical 
edge.  The  United  States  military  has  excelled  in  conventional  warfare  because  of 
the  continual  push  to  advance  its  technological  capabilities.  Foreign  governments  at¬ 
tempt  to  counter  this  by  either  purchasing  critical  components  [13]  or  salvaging  them. 
Regardless  of  how  they  are  acquired,  the  critical  components  that  provide  the  U.S. 
military  with  its  technological  edge  will  find  their  way  into  enemy  hands. 

Typically,  only  a  sample  or  two  of  an  Application  Specific  Integrated  Circuit 
(ASIC)  falls  into  enemy  hands  and  the  reverse  engineering  process  requires  multiple 
samples  as  the  circuit  is  destroyed  in  the  process.  This  is  changing  with  the  Depart¬ 
ment  of  Defense  (DoD)  pushing  more  military  systems  to  use  commercial  off-the-shelf 
(COTS)  parts  [8].  This  change  is  due  to  the  need  of  deploying  new  capabilities  to  the 
warfighter.  The  preference  is  to  give  warfighter  the  80%  solution  now,  rather  than 
the  100%  solution  in  Eve  to  ten  years  as  has  been  done  in  previous  years.  Utiliz¬ 
ing  COTS  components  and  integrating  them  rather  than  doing  a  complete  research 
and  development  effort  from  scratch  drastically  reduces  the  time  required  to  deliver 
new  capabilities.  Additionally,  the  use  of  COTS  provides  additional  program  savings 
due  to  ease  of  availability  and  inexpensiveness.  This  switch  to  COTS  also  involves  a 
switch  to  commercial  ASICs  and  Field  Programmable  Gate  Arrays  (FPGAs).  Unfor¬ 
tunately  the  same  reasons  that  make  COTS  ASICs  and  FPGAs  advantageous  are  also 
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liabilities,  namely  being  inexpensive  and  their  easy  of  availability  on  the  commercial 
market. 

This  makes  reverse  engineering  and  tampering  simpler  as  multiple  samples  can 
be  easily  purchased  with  no  governmental  oversight.  Once  the  functionality  is  reverse 
engineered,  design  of  a  functional  equivalent  circuit  is  trivial  and  the  technical  ad¬ 
vantage  is  lost.  Additionally,  hidden  lethal  functionality  could  be  added  and  the  now 
tainted  ASICs  could  be  returned  to  government  control,  either  via  normal  acquisition 
channels  or  through  the  return  of  equipment  loaned  to  U.S.  allies. 

This  potential  threat  has  become  more  prevalent  due  to  the  increased  use  of  FP- 
GAs  instead  of  traditional  ASICs.  Program  managers  have  increased  FPGA  usage  due 
to  their  lower  life-cycle  cost.  From  a  programmatic  view,  FPGAs  provide  a  lower  cost 
solution  in  three  areas  due  to  their  reprogrammability.  First,  the  design-implement- 
test  cycle  is  far  faster  because  new  functionality  does  not  have  to  be  shipped  out  to 
a  fabrication  center  and  can  be  done  on-chip.  Second,  out-year  fund  savings  occur 
as  systems  that  use  FPGAs  can  be  easily  updated  with  new  functionality,  allowing 
new  capabilities  to  be  used  by  the  warfighter  much  faster  than  the  acquisition  system 
would  normally  allow.  Third,  ASICs  typically  undergo  a  redesign  every  time  there 
is  an  improvement  in  fabrication  technology  costing  time  and  money.  FPGAs  are 
downwards  compatible,  meaning  that  old  configuration  programs  are  just  recompiled 
for  new  FPGA  designs. 

1.1  Problem  Statement 

Due  to  the  increasing  use  of  non-military  proprietary  hardware,  adversaries 
may  have  increased  their  chances  of  learning  how  COTS-based  military  hardware 
functions  through  reverse  engineering  the  COTS  components.  A  method  is  needed 
to  uniquely  identify  each  piece  of  hardware  and/or  generate  an  unique  signature  that 
can  not  be  duplicated  or  fabricated  by  adversaries.  Such  a  signature  would  allow  easy 
verification  of  loaned  hardware  to  prevent  modified  or  incorrect  circuits  with  hidden 
functionality  from  entering  the  acquisition  system.  Additionally,  it  could  provide 
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hardware  unique  encryption  and  decryption  keys.  The  nature  of  this  signature  will 
require  the  hardware  functionality  be  tied  to  the  physical  silicon  on  which  it  resides. 
The  first  step  in  signature  creation  is  identifying  a  methodology  for  differentiating 
multiple  functionally  and  structurally  identical  circuits  from  the  same  vendor. 

1.2  Goals 

The  goals  of  this  research  are: 

1.  Assess  circuit  signal  attributes  that  can  potentially  serve  as  part  of  an  identifier 

2.  Identify  the  most  likely  attribute  or  attributes  that  can  be  used  on  both  ASIC 
and  FPGA  circuits 

3.  Validate  the  proposed  attributes  in  an  FPGA  architecture 

1.3  Scope  and  Assumptions 

This  research  uses  the  Xilinx  Virtex-II  Pro  architecture  to  show  the  digital 
fingerprint  concept  due  to  the  time  and  cost  constraints  on  fabricating  an  ASIC  test 
circuit.  Implementation  can  be  performed  on  any  FPGA,  but  is  being  done  on  the 
Virtex-II  Pro  architecture  due  to  easy  access  to  multiple  samples. 

1.4  Methodology 

Multiple  physical  signal  attributes  are  examined  as  potential  candidates  for 
use  in  creation  of  a  digital  fingerprint.  One  or  more  will  be  selected  and  be  used 
as  the  basis  of  a  unique  identifier.  Very  High  Speed  Integrated  Circuit  (VHSIC) 
Hardware  Description  Language  (VHDL)  code  is  used  to  implement  and  show  this  on 
a  reconfigurable  FPGA. 
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1.5  Materials  and  Equipment 

The  following  software  and  hardware  is  used  in  this  research:  Mentor  Graphics 
ModelSim  PE  Student  Edition  6.3c,  MathWorks  Matlab®  6.5,  Xilinx  ISE  9.2i  and 
Virtex-II  Pro  XC2VP30  FPGA,  and  Agilent  16902  Logic  Analyzer. 

1 . 6  Overview 

This  research  is  organized  in  the  following  manner.  Chapter  II  contains  an 
introduction  to  the  architecture  of  the  Virtex-II  Pro,  tamper  methods  for  reverse 
engineering  and  altering  ASIC  and  FPGAs,  and  other  research  that  attempts  to  create 
unique  identifiers.  Chapter  III  describes  the  digital  fingerprint  implementation  that 
is  examined  in  this  research.  Chapter  IV  provides  the  test  circuits  for,  and  the  results 
gathered  from,  the  digital  fingerprint  implementation  in  Chapter  III  with  Chapter  V 
summarizing  and  concluding  this  research. 
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II.  Background 


This  chapter  provides  an  overview  of  how  a  FPGA  is  designed  structurally  and 
operates  as  the  Digital  Fingerprint  concept  described  in  Chapter  III  is  imple¬ 
mented  on  a  commercial  FPGA.  Some  of  the  tamper  methods  that  can  be  used  to 
alter  or  allow  the  theft  of  the  functionality  in  an  integrated  circuit  (IC)  are  examined. 
Finally,  a  summary  of  other  efforts  to  create  a  unique  identifier  for  ICs  is  provided. 

2.1  Field  Programmable  Gate  Array 

A  FPGA  is  a  special  type  of  ASIC  that  has  an  architecture  which  allows  the 
implementation  virtually  any  functionality.  ASIC  fabrication  has  a  large  development 
cost  and  any  change  to  the  underlying  functionality  or  in  the  fabrication  feature  size 
necessitates  a  redesign  and  new  fabrication  run.  As  a  result,  the  FPGA  has  become 
popular  in  the  commercial  market,  and  by  extension  in  DoD  acquisition  programs, 
due  to  the  easy  of  implementing  new  functionality.  Configuring  a  FPGA  involves  a 
design  engineer  writing  a  hardware  description  in  a  pseudo-computer  language,  VHDL 
or  Verilog,  and  downloading  it  to  the  FPGA  to  run.  Updates  are  as  simple  as  adding 
a  few  new  lines  of  code,  running  workbench  tests  on  a  single  chip,  and  uploading  the 
new  code. 

2.1.1  Architecture.  The  FPGA  is  a  special  type  of  ASIC  that  is  designed  to 
have  reconfigurable  functionality  based  on  a  string  of  input  bits.  The  basic  structure 
of  any  FPGA  can  be  seen  in  Figure  2.1.  An  FPGA  consists  of  four  main  components: 
the  input/output  (I/O)  pins,  which  allow  communication  between  the  internals  of 
the  FPGA  and  the  outside  world;  the  configurable  logic  blocks  (CLBs)  where  the 
actual  functionality  of  the  circuit  loaded  into  the  FPGA  resides;  the  routing  and 
routing  switches  which  connect  up  the  CLBs  and  the  I/O  pins.  The  amount  of  space 
used  by  all  the  CLBs  on  a  chip  is  small,  relative  to  the  connections  between  CLBs 
which  take  up  the  bulk  of  the  silicon.  Due  to  the  reconfigurable  nature  of  the  FPGA, 
these  connections  need  to  be  large  and  versatile  to  allow  a  wide  variety  of  routings. 
Figure  2.2  shows  the  internals  of  a  basic  CLB.  Inside  every  CLB  is  three  components, 
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Configurable  Logic  Block 
(CLB) 


Routing  Switch 


Routing 


Figure  2.1:  Basic  structure  of  an  FPGA,  consisting  of  I/O  pins,  CLBs,  routing 

wires,  routing  switches  [19]. 


a  Look-Up  Table  (LUT),  D  Flip-Flop  (DFF),  and  multiplexer  (MUX).  The  LLIT  is 
a  four  input  truth  table  that  can  implement  any  four  variable  Boolean  equation  and 
provide  a  single  output.  The  DFF  is  a  simple  storage  element  used  for  sequential 
circuits  whose  output  takes  on  the  input  value  upon  the  reading  of  a  clock  edge.  The 
multiplexer  allows  the  setting  of  the  CLB  to  either  perform  combinational  logic  by 
passing  the  LUT  output  directly  or  sequential  logic  by  passing  the  DFF  output.  In 
reality  the  CLB  is  much  more  complex  as  Figure  2.3  shows  one  of  four  slices  in  a 
CLB. 


2.1.2  Configuration.  Configuration  of  a  FPGA  is  done  using  a  data  stream 
known  as  the  bitstream.  The  bitstream  contains  all  the  configuration  information 
for  the  LUTs,  multiplexers,  DFFs,  and  CLB  interconnects  and  is  both  manufacturer 
and  chip  type  dependent.  Commonly,  the  bitstream  is  stored  in  static  random  access 
memory  (SRAM)  cells  on  the  FPGA.  SRAM,  maintains  its  contents  as  long  as  it 
is  powered,  but  being  a  volatile  memory  type  requires  a  reload  of  the  bitstream 
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Out 


Figure  2.2:  The  structure  of  a  basic  CLB. 


Figure  2.3:  The  internal  structure  of  a  Xilinx®  Virtex-II  Pro  CLB  slice  [22], 
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anytime  there  is  a  power  loss.  Some  FPGAs  use  non-volatile  memory  such  as  Flash 
or  antifuses.  These  are  mainly  used  on  FGPAs  that  require  extra  security  as  volatile 
memory  FPGAs  require  off  chip  bitstream  storage  and  provide  an  additional  attack 
vector  for  the  reverse  engineering  process. 

It  is  worth  noting  that  a  change  in  the  functionality  that  is  to  be  programmed 
into  the  FPGA  will  result  not  only  a  different  bitstream  but  a  different  part  of  the 
FPGA  being  configured  to  implement  that  functionality.  As  designers  typically  do 
not  care  how  the  internal  configuration  of  the  FPGA  is  done,  this  aspect  of  FPGA 
utilization  is  typically  overlooked  but  will  play  a  critical  part  in  the  implementation 
of  the  digital  fingerprint  on  an  FPGA. 

2.2  Tamper  Methods 

There  are  a  number  of  different  ways  of  discovering  and  altering  the  functionality 
of  a  circuit.  Collectively,  these  are  called  tamper  methods  and  can  apply  to  either 
ASICs  or  FPGAs. 

2.2.1  Brute  Force  Attacks.  Brute  force  attacks  interact  with  an  IC  through 
its  I/O  pins  and  are  performed  either  while  the  IC  is  in  natural  environment  or  stand¬ 
alone. 


2. 2. 1.1  Black  Box  Attacks.  Traditionally,  the  black  box  attack  has 
been  the  first  and  simplest  attack  to  perform  on  an  IC  in  an  attempt  to  reverse 
engineer  it.  Chips  are  examined  individually,  all  possible  input  combinations  applied 
and  the  outputs  recorded.  Using  a  large  truth  table,  data  analysis  algorithms,  or 
in  some  cases  visual  inspection,  the  underlying  Boolean  equations  that  define  the 
IC’s  logic  can  be  recreated  [21].  This  type  of  attack  only  works  well  on  circuits  with 
well  defined  inputs  and  outputs,  combinational  circuits,  and  latches  [9].  Synchronous 
circuits,  like  state  machines,  increase  the  complexity  of  reverse  engineering  as  an 


incorrect  input  at  the  wrong  time  may  cause  the  circuit  to  miss  a  state  transition  or 
reset,  resulting  in  an  incomplete  understanding  of  the  circuit. 

With  FPGAs,  the  challenges  are  greater,  due  to  the  number  of  pins  that  can 
be  assigned  as  either  inputs  or  outputs,  and  in  some  cases  both.  There  are  2"  input 
combinations  on  any  1C,  where  n  is  the  number  of  input  pins  [6].  The  basic  Xilinx® 
Virtex-Il  Pro  XC2VP2  introduced  in  2002,  has  204  pins  that  can  be  designated  as 
either  input,  output,  or  bi-directional  [23].  If  we  make  the  simplifying  assumptions 
that  the  circuit  is  purely  combinational,  and  half  the  pins  function  as  inputs  so  n  — 
102,  then  there  are: 

2 102  =  5,  070,  602, 400,  912,  917,  605,  986,  812,  821,  504 


or  «  5  *  1030  combinations  to  apply  to  the  circuit. 

If  the  reverse  engineer,  with  no  prior  knowledge  of  the  FPGA,  utilizing  a  state- 
of-the-art  automatic  test  equipment  (ATE)  running  at  1  GHz  (roughly  $1.5M),  picks 
the  correct  pins  to  apply  the  input  on  the  first  try,  it  would  take: 


5  *  10 30 inputs  * 


sec 

109  input 


min  hr  day  year 

* - * - * - * - 

60sec  60  min  24  hr  365 hr 


pc  1.6  *  10  uyears 


to  examine  the  entire  space  of  input  combinations.  This  is  long  after  the  Earth  has 
become  nothing  more  than  a  burnt  cinder  around  a  dead  sun. 


2.2. 1.2  Passive  Attacks.  The  next  step  up  from  the  black  box  attack 
is  the  passive  attack.  This  attack  is  conceptually  very  simple.  ICs  are  examined  in 
their  native  environment,  i.e.  while  they  are  being  used  in  an  actual  circuit.  Input 
and  output  lines  are  monitored,  typically  using  either  an  oscilloscope  or  logic  analyzer, 
and  data  is  recorded  giving  a  good  picture  of  the  chips  functionality.  This  attack  takes 
far  less  time  than  the  black  box  attack  as  an  exhaustive  search  of  all  possible  input 
combinations  is  not  necessary.  However,  if  the  chip  being  examined  is  synchronous 
in  nature,  it  is  possible  that  not  all  states  are  seen  leaving  out  vital  functionality. 
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PIC16F84 -  041/P 
9912CAW 


Figure  2.4:  PIC16F84  microcontroller  packaged  and  depackaged  [17]. 

Typically,  this  attack  is  used  as  an  initial  probe  to  provide  focus  for  a  follow-on  black 
box  attack  for  state  machines.  For  FPGAs,  the  passive  attack  greatly  simplifies  the 
black  box  attack  search  space  as  the  input  and  output  pin  are  readily  identified. 

2.2.2  Hardware  Attacks.  Hardware  attacks  focus  on  the  physical  silicon 
of  the  chip.  Typically,  this  involves  depackaging  the  1C,  as  shown  in  Figure  2.4, 
although  for  some  attacks  this  is  unnecessary.  This  allows  direct  interaction  at  the 
transistor  level  without  having  to  go  through  the  I/O  pins,  allowing  a  better  functional 
understanding.  These  attacks  carry  the  risk  of  destroying  the  IC  as  they  are  directly 
interacting  with  the  silicon.  Consequently,  multiple  samples  of  the  IC  being  attacked 
are  required.  In  the  case  of  commercial  chips,  this  is  a  fairly  simple  task,  resulting  in 
a  higher  likelihood  of  successful  reverse  engineering. 

2.2.2. 1  Mechanical  Probes.  Mechanical  probing  is  the  traditional 
method  of  reverse  engineering  post-depackaging  of  the  IC.  Small  metal  probes  are 
used  to  measure  voltage  and  current  at  the  transistor  level,  providing  a  finely  detailed 
look  at  the  circuit  functionality.  As  physical  contact  is  required  to  make  these  mea¬ 
surements,  there  exists  the  chance  that  the  probe  contact  will  destroy  some  of  the 
transistor  or  wires. 

2. 2. 2. 2  Focused  Ion  Beam.  The  focused  ion  beam  (FIB)  is  a  semi¬ 
conductor  fabrication  device  used  as  a  micro- machining  tool.  It  has  three  modes  of 
operation.  First  the  FIB  can  image  in  a  similar  method  to  the  scanning  electron  mi¬ 
croscope  (SEM)  but  using  gallium  ions  instead  of  electrons.  Unlike  the  SEM,  it  has  a 
destructive  effect  as  the  gallium  ions  are  implanted  into  the  sample  surface  changing 
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Figure  2.5:  FIB  created  image  of  a  circuit  altered  by  the  FIB’s  micro-machining 
capabilities  [15]. 

the  nature  of  the  underlying  surface  through  introduction  of  crystalline  defects  or  ad¬ 
ditional  electrons.  The  second  and  third  modes  are  its  micro-machining  capabilities, 
allowing  milling  or  metal  deposition,  typically  tungsten  [10].  Figure  2.5  shows  an  FIB 
image  of  a  chip  that  has  had  both  milling  and  metal  deposition  performed  on  it. 

2. 2. 2. 3  Optical.  Optical  tamper  methods  are  semi-invasive  as  they 
require  the  depackaging  of  the  IC  but  do  not  have  any  physical  interaction  with 
it.  Instead,  they  rely  on  the  interaction  of  photons  with  semiconductor  devices.  The 
photons  must  have  an  energy  higher  than  the  bandgap  of  the  material  being  examined, 
1.12  eV  for  silicon.  Lasers  are  often  used  as  they  can  be  focused  on  a  small  area. 
These  attacks  take  two  forms:  optical  probe  and  optical  attack.  Optical  probing 
focuses  on  circuit  examination  by  looking  at  transistor  states  [16].  Figure  2.6  shows 
an  SRAM  cell  under  optical  probe  and  was  created  by  measuring  the  photocurrent 
induced  by  the  laser  scanning  the  SRAM.  Color  indicates  the  measured  photocurrent,. 
The  optical  attack  works  by  activating  transistors  in  the  same  way  that  exposure  to 
gamma  radiation  would,  creating  free  electrons  by  exciting  them  to  the  conduction 
band  causing  a  bit  flip.  The  effect  on  the  IC,  e.g.  broad  area  ionization,  is  not  as 
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Figure  2.6:  Examination  of  SRAM  cells,  (a)  Shows  the  cells  unpowered.  (b)  16 
SRAM  cells  in  a  4x4  grid  loaded  with  values.  The  first  row  is  read  1100  [16]. 


great  as  gamma  irradiation  due  to  the  lower  energy  laser  generated  photons  and  the 
small  area  they  are  applied. 


2.2.3  Side  Channel  Attacks.  Rather  than  using  brute  force  to  physically 
probe  the  internals  of  an  IC,  areas  that  leak  unintended  information  known  as  side 
channels  are  used.  Side  channel  attacks  use  this  secondary  information  to  create  a 
picture  of  IC  functionality.  They  include  power  consumption  and  timing  analysis. 


2.2.3. 1  Power  Consumption.  Power  consumption  attacks  originated 
for,  and  still  mainly  focus  on,  breaking  cryptography  schemes.  For  reverse  engineer¬ 
ing,  the  concept  is  that  through  an  examination  of  the  power  utilization  of  an  IC, 
the  underlying  circuit  functionality  can  be  found.  There  are  two  types  of  power 
consumption  attacks:  Simple  Power  Analysis  (SPA)  and  Differential  Power  Analysis 
(DPA)  [14].  SPA  consists  of  creating  power  consumption  vs  time  graphs,  captured  at 
a  high  sample  rate,  of  the  power  or  ground  input.  Analysis  of  the  data  can  show  how 
the  chip  operates  on  the  instruction  level  [3].  This  attack  can  be  done  using  a  decent 
quality  digital  oscilloscope. 

DPA  is  similar  to  SPA  in  the  use  of  power  consumption  vs  time  graphs.  Ad¬ 
ditionally,  DPA  uses  statistical  analysis  and  error-correction  statistical  methods  [3]. 
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This  two-pronged  approach  gives  an  attacker  insight  into  the  data  values  that  are 
being  manipulated  on  a  chip.  It  is  possible  to  then  correlate  this  collected  data  to 
known  functions  in  order  to  see  exactly  what  is  happening.  This  attack,  due  to  the 
extra  analysis,  takes  longer  than  SPA,  but  provides  a  better  picture  of  the  internal 
functionality. 

2. 2. 3. 2  Timing  Analysis.  With  brute  force  attacks,  synchronous  cir¬ 
cuits  add  additional  complexity  in  the  reverse  engineering  process  due  to  the  timing 
constraints  that  are  introduced.  For  circuit  designers,  synchronous  circuits  are  much 
easier  to  design  as  the  clock  functions  as  a  control  signal  allowing  data  to  be  sent 
and  received  at  specific  times.  However,  the  predictability  of  the  clock  signal  and  its 
permeating  nature  throughout  the  circuit  can  be  used  as  an  attack  vector.  Timing 
attacks  focus  on  taking  the  circuit  outside  normal  parameters  by  modifying  the  speed 
of  the  clock,  either  speeding  up  or  slowing  it  down.  The  change  in  clock  speed  can 
cause  a  synchronous  circuit  to  miss  a  clock  edge  and  enter  into  an  incorrect  state  or 
miss  a  state  transition  entirely,  while  stopping  it  will  allow  an  attacker  to  see  what  is 
happening  internally  at  any  clock  cycle. 

2.2.4  FPGA  Specific  Attacks.  The  previously  mentioned  attacks  work  on 
both  ASICs  and  FPGAs.  As  FPGAs  have  the  ability  to  be  reconfigured,  there  are 
additional  attacks  that  can  be  used  by  the  reverse  engineer  that  focus  on  the  config¬ 
uration  bitstream. 

2.2.4. 1  Readback  Attacks.  A  common  feature  of  FPGAs  is  the  ability 
to  read  the  configuration  data  out.  This  feature  is  used  to  troubleshoot  programming 
that  has  been  loaded  onto  the  FPGA.  In  this  attack,  the  configuration  data  is  read 
from  the  FPGA  via  the  programming  or  Joint  Test  Action  Group  (JTAG)  port. 
Normally,  this  functionality  is  disabled  through  the  setting  of  a  security  bit.  The 
security  bit,  while  fairly  well  protected,  can  be  neutralized  by  using  a  fault  generating 
attack,  like  the  optical  attack. 
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2. 2. 4-2  Cloning  Attacks.  In  SRAM  and  other  volatile  memory  based 
FPGAs,  the  bitstream  is  stored  off-chip  in  non-volatile  memory,  usually  programmable 
read-only  memory  (PROM),  and  is  loaded  in  on  power-up.  This  provides  an  attacker 
easy  access  to  the  bitstream  by  just  monitoring  the  bus  lines  from  the  PROM  to  the 
FPGA.  The  bitstream  then  can  be  easily  loaded  into  another  FGPA  creating  an  exact 
clone.  This  attack  is  extremely  useful  as  it  allows  any  number  of  functional  copies  to 
be  made  and  be  reverse  engineered  in  parallel. 

2.3  Physically  Uncloneable  Functions 

A  review  of  current  literature  shows  that  most  efforts  in  the  creation  of  unique 
identifiers  focus  on  the  watermarking  of  intellectual  property,  mainly  in  terms  of 
functionality  implemented  on  FPGAs.  One  group  has  looked  at  the  creation  of  a 
unique  identifier  for  both  ASIC  and  FPGAs. 

2.3.1  Background.  In  2002,  MIT  published  a  paper,  [11],  on  what  they  called 
silicon  physical  random  functions.  These  functions,  later  renamed  physical  unclone¬ 
able  functions  (PUFs),  are  stand-alone  structures  specifically  designed  to  create  an 
ID  via  variations  in  internal  delay  of  the  PUF  function  due  to  manufacturing  process 
variations.  As  of  June  2007,  MIT  has  proposed  and  tested  two  PUF  circuits  based 
around  this  concept,  an  arbiter  and  a  ring  oscillator  [18].  Both  of  these  structures 
generate  their  IDs  based  on  the  relative  difference  of  the  transient  response  of  the 
wire  delay  after  an  input  is  applied. 

2.3.2  Arbiter  PUF.  The  arbiter  PUF  uses  a  single  signal  traveling  down  two 
wires  of  identical  length.  The  wires  pass  through  multiple  switching  boxes,  consisting 
of  two  MUXes  each,  to  a  DFF  at  the  end.  Control  of  the  switching  box  array  is  done 
via  the  PUF  input  bits,  as  shown  in  Figure  2.7.  A  value  of  X—0  has  the  signals 
pass  though  while  X=1  causes  the  signals  to  switch  lines.  As  the  signals  propagate 
through  the  circuit,  variations  in  the  wire  will  cause  changes  in  the  signals’  delay.  The 
DFF  outputs  a  0  or  1  depending  on  which  signal  arrives  first. 


14 


X[0]  X[1]  X[1 26]  X[1 27] 


Figure  2.7:  MIT’s  Arbiter  PUF.  Signals  can  switch  lines  depending  on  MUX  input 
bits.  The  output  Y  reports  which  path  was  faster  [18]. 

To  generate  an  ID  that  is  unique  over  multiple  ICs,  one  of  two  methods  can  be 
used.  First  is  to  apply  n  different  input  vectors  to  a  single  arbiter  PUF  to  get  an 
output  of  length  n.  Second  is  to  replicate  the  PUF  n  times  and  apply  one  input  to 
all  PUFs  simultaneously  giving  an  n-bit  output. 

2.3.3  Ring  Oscillator  PUF.  The  ring  oscillator  PUF,  or  RO  PUF,  uses 
variations  in  frequency  of  many  identically  laid-out  ring  oscillators  to  generate  a  circuit 
ID.  Manufacturing  variations  will  cause  changes  in  wire  and  gate  delay,  altering  the 
frequency  of  each  oscillator  and  the  number  of  pulses  generated  over  a  time  interval. 
The  oscillators  are  selected  in  pairs,  via  MUX,  and  the  number  of  pulses  over  an 
arbitrary  length  of  time  counted.  The  output  is  based  on  which  oscillator  had  the 
larger  number  of  pulses,  as  shown  in  Figure  2.8. 

Generating  a  unique  ID  is  harder  for  RO  PUFs  than  arbiter  PUFs.  Normally 
for  N  oscillators  there  are  N(N  —  l)/2  possible  combinations,  giving  that  many  iden¬ 
tification  bits.  Due  to  correlation,  if  the  number  of  pulses  of  oscillator  A  is  greater  the 
number  of  oscillator  B  and  B  >  C,  then  the  number  of  pulses  of  A  >  C.  The  actual 
number  of  possible  combinations  now  becomes  log2(N\).  Therefore,  128  oscillators 
produce  a  maximum  of  716  bits  and  1024  oscillators  produce  8769  bits  and  so  on. 
Alternatively,  each  oscillator  pair  could  only  be  used  once  so  128  pairs  of  oscillators, 
256  total,  would  give  a  128-bit  ID. 
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2.3.4  Examination  of  the  RO  PUF.  In  Suh  et  al  [18],  results  of  an  RO 
PUF  implemented  across  15  Virtex-4  LX25  FPGAs  are  reported.  A  128-bit  FPGA 
signature  is  generated  using  1024  ring  oscillators,  each  consisting  of  five  inverters 
and  one  AND  gate.  As  only  256  oscillators  are  needed  to  generate  a  128-bit  output, 
using  each  RO  once,  the  additional  ring  oscillators  are  used  to  ensure  a  large  enough 
frequency  difference  is  present  between  pairs  to  generate  reliable  outputs.  This  results 
in  only  a  quarter  of  the  total  number  of  ring  oscillators  being  used  to  generate  an 
identification  signature.  By  using  each  oscillator  only  once,  only  1.5%  of  the  total 
number  of  potential  identification  bits  are  used. 

The  data  shows  an  average  variation  between  FPGAs  of  46.15%  when  comparing 
output  bits  across  all  15  FPGAs.  However,  this  data  only  uses  105  out  of  the  128-bits 
generated.  No  mention  is  made  of  the  missing  23  bits.  If  there  was  no  change  in 
their  output  they  would  have  been  counted.  One  can  deduce  that  the  outputs  were 
not  stable  and  provided  fluctuating  values  on  the  counters  over  multiple  runs.  For 
the  105  bits  that  were  used,  no  mention  is  made  of  whether  it  is  always  the  same 
bits  or  different  groupings  of  105  out  of  the  128  bits.  Also  not  mentioned  is  the  effect 
of  LUT  simplification  on  the  RO  PUF.  The  Xilinx®  ISE  reduces  combinational  logic 
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structures  down  to  Boolean  equations  for  instantiation  as  LUTs.  The  ring  oscillators 
used  in  [18]  consist  of  five  inverters  and  one  AND  gate,  which  reduce  to  a  single  LUT. 
Without  more  information  it  is  hard  to  elaborate  on,  or  evaluate,  the  performance 
of  the  MIT  PUF  design.  Regardless,  it  can  be  concluded  based  on  the  designs  given 
that  PUFs  rely  on  a  finalized  stable  output  values  to  characterize  the  circuit. 
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III.  The  FPGA  Digital  Fingerprint 


he  creation  of  a  unique  identifier  for  a  chip  has  traditionally  been  done  by 


-L  industry  via  use  of  a  serial  number,  either  imprinted  on  the  chip  package  or 
applying  it  to  the  chip  after  fabrication,  like  Intel’s  CPU  ID  on  the  Pentium  III. 
Neither  of  these  IDs  is  secure,  as  the  first  can  be  negated  by  producing  either  a 


counterfeit  package  or  replacing  the  chip  inside  the  a  legitimate  package  with  another. 


The  second  ID,  being  applied  after  fabrication  is  still  susceptible  to  either  change  or 
copying  by  a  third  party. 

3.1  The  Digital  Fingerprint  -  Overview 

The  digital  fingerprint  creates  an  unique  identifier  based  on  the  characterization 
of  signals  that  are  unique  to  a  particular  physical  implementation  of  a  functionality. 
Ideally,  all  chips  carrying  the  same  circuit  design  and  fabricated  in  the  same  way  will 
be  exactly  identical.  This  is  not  true.  The  semiconductor  fabrication  process  is  not 
perfect  and  minor  variations  exist  in  each  chip.  These  variations  can,  for  example, 
happen  in  the  doping  profile,  mask  alignment,  metal  deposition,  oxide  growth,  or 
transistor  gate  width  and  length  and  have  been  proven  to  have  a  statistically  signifi¬ 
cant  effect  on  some  circuit  attributes  [4,5].  The  effect  that  these  variations  have  can 
be  seen  in  Intel’s  processor  lines.  In  examining  their  Core  2  Duo  with  1333  MHz 
front-side  bus,  the  clock  speeds  for  the  E6850,  E6750,  and  E6550  CPUs  go  from  3  to 
2.66  to  2.33  GHz  respectively  [7].  Fabrication  facilities  are  not  cheap,  costing  in  the 
billions  of  dollars,  and  run  twenty-four  hours  a  day.  No  profitable  corporation  would 
purposely  fabricate  three  different  processors  with  identical  specifications  except  for 
small  differences  in  clock  speed  as  it  would  impact  long  term  profitability. 

The  waveforms  that  occur  on  a  signal  line  inside  a  circuit  are  dependent  on 
both  the  circuit  functionality  that  drives  the  signal  line  and  the  variations  of  the 
physical  implementation  of  that  functionality.  A  fingerprint  that  examines  the  signal 
characteristics  at  a  specific  node  can  create  an  identifier  that  will  be  unique  to  a  chip 
due  to  these  two  dependencies. 
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3.2  Circuit  Attributes 


It  is  commonly  believed  by  people  today  that  they  live  in  a  digital  world  with 
all  the  high-tech  devices  like  computers,  cell  phones,  and  high-definition  television 
operating  by  l’s  and  0’s.  The  truth  is  that  the  world  is  actually  analog  and  that 
the  devices  considered  digital  are,  in  reality,  dealing  with  analog  signals  that  are  in¬ 
terpreted  as  digital  values.  These  analog  signals  allow  us  to  use  a  number  of  circuit 
attributes  to  create  a  digital  fingerprint.  The  attributes  are  based  on  the  design  of 
the  functional  logic  that  is  being  implemented  and  the  variations  in  physical  imple¬ 
mentation  and  have  an  effect  both  on  a  signal  and  on  how  it  is  interpreted  by  the 
surrounding  logic. 

3.2. 1  Voltage.  The  voltage  on  a  signal  line  generated  by  a  circuit  determines 
whether  other  circuit  components  will  interpret  a  logic  0  or  logic  1. 

3.2.2  Noise  Margin.  The  noise  margin  is  the  range  that  an  input  voltage  to 
a  circuit  component  can  be  so  that  it  is  interpreted  as  a  logical  0  or  1.  For  a  signal 
that  can  swing  between  0  V  and  5  V,  the  noise  margin  may  be  0.5  V  for  a  logic  0  and 
4.5  V  for  a  logic  1. 

3.2.3  Current.  Voltages  move  on  a  signal  line  due  to  the  current  also  flowing. 
Changes  in  the  current  can  result  in  some  voltage  levels  not  reaching  other  circuit 
components. 

3.2.4  Setup/Hold,  Time.  In  combinational  circuits,  changes  at  the  inputs 
can  happen  at  any  rate  with  no  adverse  affects  on  the  output  value  other  than  glitches. 
Flip-flops  and  latches,  due  to  the  cross-coupled  feedback  loops  in  the  NAND  gates  of 
the  basic  cell,  as  shown  in  Figure  3.1,  have  input  constraints.  Setup  time  is  how  long 
the  D  input  must  remain  constant  until  the  clock  input  reaches  50%  max,  assuming  a 
rising  edge  triggered  D  flip-flop.  Hold  time  is  the  time  that  the  D  input  must  be  held 
constant  after  the  50%  point  of  the  clock  input  to  ensure  that  metastability  doesn’t 
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Figure  3.1:  A  Master-Slave  D  Flip-Flop  with  highlighted  basic  cell. 


elk 

_ L_ 


Figure  3.2:  Setup  and  hold  times  for  the  D  flip-flop  and  latch  [19]. 


happen.  For  a  latch,  setup  and  hold  times  must  be  observed  around  the  falling  clock 
edge.  Simply,  there  is  a  time  t  that  the  input  must  hold  constant  around  a  clock  edge 
for  proper  functionality.  Figure  3.2  shows  this  pictorially.  In  the  case  where  either  of 
these  times  is  violated,  the  output  response  is  as  shown  in  Figure  3.3. 


3.2.5  Delay.  There  are  several  different  delays,  which  are  described  below. 


3.2.5. 1  Contamination  Delay.  Contamination  delay  is  the  minimum 
time  required  for  a  logic  gate  or  combinational  circuit  input,  measured  at  50%  of  its 
final  value,  to  effect  it’s  output,  again  measured  at  50%  of  its  final  value  [19]. 

3. 2. 5. 2  Propagation  Delay.  Similar  to  contamination  delay,  propaga¬ 
tion  delay  is  the  maximum  time  required  for  an  input  measured  at  50%  of  its  final 
value  to  cause  the  output  to  reach  50%  of  its  final  value  [19].  Between  the  contami- 
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Figure  3.3:  Setup  and  hold  time  violations  for  a  D  Flip-Flop.  The  first  and  second 
pulses  violate  a  flip-flop’s  setup  and  hold  time  respectively,  resulting  in  no  change  in 
the  output.  The  third  pulse  is  wide  enough  that  it  meets  both  time  requirements  and 
results  in  a  1  on  the  flip-flop  output. 

nation  and  propagation  delays,  the  output  of  the  combinational  circuit  may  begin  to 
glitch,  or  show  multiple  transitions  between  logic  0  and  logic  1,  as  the  logic  gates  and 
paths  through  the  combination  logic  to  the  output  will  have  non-uniform  contamina¬ 
tion  and  propagation  delays.  It  can  be  argued  that  propagation  and  contamination 
delay  are  analog  in  nature.  However,  if  you  look  at  their  effect  on  a  output,  they  can 
be  considered  digital. 

3.2. 5.3  Critical  Path  Delay.  Critical  paths  in  a  logic  circuit,  whether 
combinational  or  sequential,  are  the  slowest  and  limit  the  speed  of  the  circuit.  The 
critical  path  delay  is  an  extension  of  the  propagation  delay  and  is  the  longest  amount 
of  time  in  a  circuit  for  a  change  in  the  input  to  change  the  output. 

3.3  Stand-alone  Structure  Digital  Fingerprint 

This  section  looks  at  an  attempt  to  create  a  digital  fingerprint  via  a  stand-alone 
structure  using  an  asynchronous  linear  feedback  shift  register  (LSFR). 

3.3.1  LFSR  Architecture.  The  LFSR  is  a  standard  DFF  based  shift  register 
with  two  additional  features:  a  feedback  loop  from  the  most  significant  bit  (MSB) 
output  to  the  least  significant  bit  (LSB)  input,  and  XOR  gates,  called  taps,  causing 
the  LSB  to  be  a  linear  function  of  the  previous  state  [1,6]. 
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Figure  3.4:  LFSR  in  (a)  external  configuration  [20]  and  (b)  internal  configuration 
[20] .  The  squares  highlight  which  registers  are  connected  to  the  XOR  taps. 


3.3.2  Types  of  LFSR.  There  are  two  ways  of  positioning  the  taps,  creating 
two  configurations  or  types  of  LFSR,  formally  known  as  Fibonacci  and  Galois,  but 
more  commonly  referred  to  as  external  and  internal  respectively.  In  external  LFSRs, 
taps  are  placed  in  the  feedback  loop,  creating  a  mod-2  sum  of  all  the  taps  to  appear 
on  the  input.  For  internal  LFSRs,  the  taps  are  placed  between  the  flip-flops,  as  shown 
in  Figure  3.4.  Both  architectures  have  low  gate  delays,  however  the  parallel  nature  of 
XOR  positioning  in  the  internal  design  allows  smaller  delays  than  the  external  [12]. 


3.3.3  LFSR  Operation.  The  output  of  the  LFSR  is  controlled  by  three 
parameters:  clock,  tap  positions,  and  the  initial  value  that  is  loaded  into  the  LFSR  or 
seed.  Depending  on  where  the  taps  are  placed,  the  LFSR  will  have  two  or  more  state 
spaces  that  it  can  potentially  cycle  through.  Which  state  space  used  is  dependent 
on  the  seed  value.  For  every  n-bit  LFSR  there  is  a  set  of  tap  positions  that  will 
give  only  two  state  spaces.  One  space  has  a  length  of  2n  —  1  states,  known  as  the 
maximal  length,  and  the  other  consisting  of  the  all  0  state.  Typically  used  is  the  tap 
configuration  to  produce  the  maximal  length  state  space  as  the  MSB  output  acts  as 
pseudo-random  number  generator.  The  period  of  the  pseudo-random  numbers,  i.e. 
until  they  begin  to  repeat,  is  equal  to  the  size  of  the  state  space. 


3.3.4  Asynchronous  LFSR.  LSFRs  use  DFFs  for  bit  storage  which  only 
update  their  value  on  a  clock  edge.  Latches  operate  nearly  the  same,  but  have  an 
additional  quality,  transparency.  Transparency  causes  the  latch  to  update  its  output 
while  the  clock  is  high.  Figure  3.5  shows  a  timing  diagram  with  the  output  of  a 
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Figure  3.5:  DFF  and  Latch  timing  diagram  for  output  Q  in  response  to  input  D 

w/clock  [19].  This  lines  indicate  the  relationship  between  the  signal  transitions.  Note 
the  latch  shows  transparency  by  changing  its  output  in  response  to  D  while  elk  is 
high. 

flip-flop  and  a  latch.  Common  practice  is  to  avoid  transparency  as  a  system  with  it 
will  behave  unpredictably.  However,  with  the  transparency  of  the  latch  the  states  will 
continue  to  change  during  the  half  clock  cycle  in  which  the  clock  is  high.  When  the 
clock  goes  low,  the  setup  and  hold  times  of  latches  come  into  play  and  any  transition 
on  the  latch  input  that  violates  these  will  not  be  seen.  These  times  along  with  the 
latches’  internal  delay  and  the  delays  of  the  taps  and  wires,  will  determine  the  final 
state  that  the  modified  LFSR  is  in.  As  these  delays  vary  across  FPGAs,  the  final  state 
should  be  different  for  every  FPGA.  The  digital  fingerprint  for  the  circuit,  then,  is 
the  value  stored  in  the  LFSR  after  a  clock  pulse.  As  the  clock  pulse  that  triggers  the 
LFSR  will  also  have  some  variation,  this  will  add  an  additional  factor  to  the  creation 
of  a  digital  fingerprint. 
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Figure  3.6:  Block  structure  of  NCS  fingerprint  method. 


3.4  Transition  Based  Fingerprints 

Instead  of  using  a  stand-alone  structure  to  create  a  digital  fingerprint,  we  can 
use  the  transitions  between  ‘0’  and  ‘1’. 


3-4-1  Nodal  Cumulative  Sampling.  The  nodal  cumulative  sampling  (NCS) 
method  involves  the  summation  of  transitions,  either  0  — >  1  or  1  — >  0,  over  multiple 
signal  lines  to  create  a  digital  fingerprint.  These  lines  are  from  various  places  in  a 
circuit  and  are  connected  to  the  clock  input  of  one-hot-encoded  shift  register,  as  shown 
in  Figure  3.6.  The  register  has  a  ‘F  value  on  the  LSB  and  as  transitions  are  detected 
on  a  signal  line,  ‘0’s  are  shifted  in  causing  the  T’  to  shift  towards  the  MSB.  The  bit 
that  the  ‘1’  ends  on  after  all  transitions  are  counted  is  the  fingerprint  value  for  that 
line.  Performing  this  process  over  multiple  lines  results  in  a  base-n  digital  fingerprint, 
where  n  is  the  maximum  number  of  transitions  seen  for  the  signal  set.  Variations  in 
the  signals  and  the  setup/hold  times  of  the  one- hot  shift  register  will  result  different 
fingerprint  values  across  multiple  circuit  implementations.  As  an  example,  Figure  3.7 
shows  the  basic  concept  by  performing  the  summation  of  the  0  — >  1  transitions  for 
two  signal  lines.  The  fast  0  — >  1  transition  in  the  bottom  signal  may  or  may  not 
be  captured  by  the  shift  register  depending  on  its  setup/hold  times  resulting  in  two 
unique  identifiers. 
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Figure  3.8:  Block  structure  of  the  transitional  sampling  fingerprint  method  w/signal 
2  acting  as  a  clock  to  perform  the  sampling. 

3-4-2  Transitional  Sampling.  Transitional  sampling  involves  capturing  the 
current  value  on  multiple  signal  lines  through  the  use  of  a  shift  register,  as  shown  in 
Figure  3.8.  The  trigger  to  sample  the  signals  is  a  transition,  either  0  — >  1  or  1  — >  0. 
Figure  3.9  shows  three  different  fingerprints  created  using  the  sample  six  signals  with 
the  sampling  happening  at  different  transitions  on  different  signals.  As  with  the  NCS 
method,  variations  in  the  signals  will  result  in  different  digital  fingerprints. 
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(a)  Sampled  on  second  transition,  signal  six  (b)  Sampled  on  third  transition,  signal  six 


(c)  Sampled  on  third  transition,  signal  4 


Figure  3.9:  Transitional  sampling  performed  on  six  signals. 
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IV.  Test  Circuits  and  Results 


This  chapter  describes  the  test  setups  and  results  of  the  three  proposed  digital 
fingerprint  methods  when  implemented  on  a  Xilinx®  Virtex-II  Pro  FPGA. 

4-1  Digital  Fingerprint  Implementation  on  an  FPGA 

Due  to  time  constraints,  the  digital  fingerprint  will  be  tested  on  using  an  FPGA 
rather  and  fabricating  an  ASIC.  Testing  will  be  done  on  a  Xilinx®  Virtex-II  Pro 
XC2VP30  FPGA  which  is  mounted  to  a  Xilinx®  Virtex-11  Pro  Development  System 
board  that  is  part  of  their  Xilinx®  University  Program  (XUP).  A  picture  of  the  board 
can  be  seen  in  Figure  4.1. 

The  use  of  this  board  results  in  a  more  realistic  test  as  not  only  is  the  FPGA 
being  powered,  but  so  is  all  the  other  components  on  the  board.  All  of  the  circuits 
were  designed  structurally  in  VHDL  and  synthesized  using  the  Xilinx®  1SE. 

4-2  Asynchronous  LFSR  Fingerprint  Results 

Initial  tests  with  a  VHDL  coded  64-bit  flip-flop  LFSR  running  on  the  Virtex- 
II  Pro  FPGA  showed  correct  functionality.  When  latches  were  used  and  the  LFSR 
clocked,  the  output  viewed  by  an  attached  logic  analyzer  showed  all  zeros.  This  pre¬ 
sented  a  conundrum,  as  the  LFSR  was  configured  to  maximal  length  with  a  non- zero 
seed,  and  therefore  unable  to  enter  into  the  all  0  state  normally.  Analysis  performed 
on  a  1-bit  latch  LFSR,  with  an  XOR  gate  on  the  input  and  the  feedback  loop  con¬ 
nected  to  one  of  the  XOR  inputs  showed  proper  functionality.  The  addition  of  a 
second  latch  caused  the  all  0  state  to  reappear.  The  VHDL  code  was  then  imple¬ 
mented  on  an  Altera  Cyclone  2  FPGA  to  check  if  the  problem  was  vendor  specific. 
This  resulted  in  a  reoccurrence  of  the  all  0  state.  From  these  results,  it  was  reasonable 
to  conclude  that  a  multiple  latch  feedback  loop  does  not  work  across  multiple  CLBs 
due  to  the  fundamental  design  of  commercial  FPGAs,  likely  a  safety  net  of  some  sort 
that  prevents  11011-stable  signal  transitions.  This  digital  fingerprint  generation  method 
theoretically  will  work,  but  will  have  to  be  implemented  as  an  ASIC. 
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Figure  4.1:  The  XUP  development  board. 


4-3  Transition  Based  Fingerprint  Test  Circuits 

Any  number  of  functionally  complex  circuits  could  be  used  to  validate  the  tran¬ 
sition  based  digital  fingerprint  methods.  As  design  synthesis  on  to  an  FPGA  results 
in  combinational  logic  being  implemented  as  LUTs,  a  functionally  simple  circuit  can 
allow  access  to  the  LUTs  just  as  well  as  a  complex  circuit.  Therefore,  a  combina¬ 
tional  multiplier  will  be  utilized.  This  does  not  mean  that  the  digital  fingerprint  is 
limited  to  the  output  of  the  circuit  being  examined.  Any  internal  node  that  has  signal 
transitions  can  be  utilized  in  the  creation  of  the  digital  fingerprint. 


4-3.1  Combinational  Multiplier.  The  1-bit  full  adder,  as  shown  in  Figure  4.2, 
is  used  in  beginning  undergraduate  digital  logic  classes  to  show  the  effect  of  gate 
delay  on  a  circuit.  Full  adders  are  chained  together  to  create  ripple  adders,  where 
the  carry-out  of  the  nth  adder  is  connected  to  the  carry-in  of  the  (n  +  l)th  adder. 
More  significant  bits  are  required  to  wait  for  carry  values  to  propagate  through  before 
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Figure  4.2:  (a)  The  1-bit  full  adder  at  the  gate  level,  (b)  Truth  table  of  the  full 

adder. 

reporting  a  final  result.  In  the  mean  time,  intermediate  values  are  placed  on  the 
outputs  which  then  change  when  updated  with  the  carry  results.  This  causes  the 
outputs  to  fluctuate,  i.e.  glitch,  and  perform  multiple  transitions  between  0  and  1  until 
finally  settling  to  a  constant,  final  value.  After  modifying  the  full  adder  to  perform 
partial  product  multiplication,  as  shown  in  Figure  4.3,  multiple  full  adder  chains  can 
be  combined  to  create  a  combinational  multiplier,  as  shown  in  Figure  4.4.  When 
reduced  to  LUTs,  each  full  adder  will  become  a  pair  of  LUTs,  one  each  for  the  sum 
and  carry-out.  Upon  applying  inputs,  the  sums  and  carrys  are  propagated  throughout 
the  array  causing  potentially  a  large  number  of  transitions  on  the  outputs.  While  the 
number  of  transitions  that  appear  on  the  multiplier  output  will  remain  consistent 
across  all  FPGAs,  the  length  of  the  transitions  will  change  due  to  variations  in  the 
FPGA. 
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Figure  4.3:  Partial  product  1-bit  full  adder.  The  AND  gate  creates  the  partial 

product.  [2] 


x(«-1) - x(2)  x(1)  x(0) 

y(0) 


■y(D 

y(2) 


y(n-1) 


Figure  4.4:  Full  adder  arrangement  for  creating  a  multiplier  [2], 
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4-3.2  Test  Circuits  Input  Via  LFSR.  Inputs  for  the  test  circuits  are  provided 
by  an  LFSR  to  minimize  the  input  signal  skew.  Using  external  function  or  waveform 
generators  would  require  a  large  amount  of  difficult  synchronization  of  the  inputs. 

4-3.3  Test  Circuits  Limitations.  Due  to  the  implementation  of  positive  edge 
triggered  DFFs  in  the  FPGA,  only  0  — >  1  transitions  can  be  measured.  While  placing 
an  inverter  on  the  DFF  clock  input  would  allow  examination  of  1  — »  0  transitions,  it 
would  also  remove  many  of  the  short  transitions  that  would  allow  the  FPGAs  to  be 
differentiated.  Also,  inputs  are  provided  to  the  multiplier  test  circuit  via  an  LFSR 
internally,  rather  than  an  external  input  due  to  the  lack  of  expansion  header  pins  on 
the  XUP  board. 

4-4  Nodal  Cumulative  Sampling 

This  circuit  looks  at  the  feasibility  of  counting  transitions  generated  by  the 
multiplier  and  the  differences  in  their  quantity  across  multiple  FPGAs. 

4-4-1  Structure.  For  this  circuit,  we  will  use  a  32-bit  combinational  multi¬ 
plier,  also  known  as  a  32x32  multiplier.  Input  values  will  be  loaded  via  LFSR  and  the 
output  of  each  bit  connected  to  the  clock  input  of  a  20-bit  wide  one- hot-encoded  shift 
register,  as  shown  in  Figure  4.5.  A  shift  register  is  used  instead  of  a  binary  counter 
due  to  the  simpler  implementation  as  the  counter  logic  would  require  additional  LUT 
overhead.  As  a  transition  occurs,  the  shift  register  will  move  a  pre-loaded  ‘P  from 
the  LSB  through  the  register  to  the  MSB.  The  position  that  it  stops  on  indicates 
the  number  of  transitions  seen.  Only  transitions  with  a  long  enough  setup  and  hold 
time  will  activate  a  shift  as  shown  in  Figure  3.3.  Naturally,  this  implies  that  due  to 
limitations  caused  by  the  shift  registers  we  may  not  be  able  to  record  all  transitions, 
however  more  will  be  seen  than  if  just  using  a  logic  analyzer  directly.  Additionally, 
the  transitions  that  are  counted  are  not  only  dependent  on  the  delay  variations  of 
the  LUTs  but  also  on  the  variations  of  the  DFFs  used  to  create  the  shift  registers. 
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Figure  4.5:  Structure  of  the  nodal  cumulative  sampling  circuit. 

This  will  help  create  more  variation  between  circuits  that  can  be  exploited  for  FPGA 
differentiation. 

4-4-2  ModelSim  Results.  The  circuit  is  structurally  simulated  in  ModelSim 
using  full  adders  to  gain  an  understanding  of  the  number  of  transitions  that  may  be 
seen  on  the  outputs.  The  simulation  is  limited  by  a  number  of  factors:  the  minimum 
timestep  allowed  is  1  ns,  there  is  no  variation  due  to  manufacturing  in  the  delays  of  the 
full  adders,  the  shift  register  latches  in  all  data  immediately  upon  a  clock  signal  edge, 
and  all  interconnection  wires  are  considered  perfect.  Due  to  the  speed  of  the  LUTs 
and  the  setup/hold  times  of  the  shift  registers,  it  is  likely  that  not  all  the  transitions 
shown  by  the  ModelSim  output  will  actually  be  seen  experimentally.  Figure  4.6  shows 
a  single  bit  of  the  multiplier  output  along  with  the  resulting  expected  waveform  of 
the  shift  register. 

4-4-3  Multiplier  Input  Combinations.  A  multiplier  input  combination  needs 
to  provide  fairly  constant  results  over  a  large  number  of  runs.  Examination  of  Model¬ 
Sim  results  for  the  combination  multiplier  show  that  the  number  of  transitions  increase 
near  the  center  of  the  multiplier  output. 
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Implementing  the  transitional  sampling  circuit  and  using  bit  32  as  a  gauge  of  the 
number  of  transitions,  multiple  input  combinations  are  applied  and  output  stability 
is  examined.  Results  show  that,  when  holding  one  multiplier  input  as  OxFFFFFFFF, 
output  stability  occurs  when  the  second  input,  having  n  consecutive  Is,  has  n  +  1 
consecutive  Os  following,  i.e.  1100011000.  Analysis  of  the  full  adder/LUT  array  shows 
that  this  is  due  to  the  structure  of  the  array  and  the  speed  at  which  the  array  operates. 
Every  row  of  the  multiplier  that  reports  a  sum  value  of  ‘l’s  potentially  creates  a  set  of 
transitions.  Multiple  rows  of  ‘l’s  cause  both  the  sum  and  carry-out  outputs  to  change, 
Figure  4.7.  Too  many  simultaneous  ripple  adder  rows  of  ‘l’s  causes  a  large  number  of 
fast  changing  transitions  to  appear  at  the  output.  These  transitions  violate  the  shift 
register  clock  input  setup  and  hold  times  causing  them  not  to  be  read.  Rows  of  ‘0’s  act 
as  a  buffer  by  reducing  the  total  number  of  transitions  on  the  output  and  increasing 
the  length  of  the  remaining  transitions.  To  make  sure  that  output  transitions  can  be 
consistently  read,  inputs  of  OxFFFFFFFF  and  0x49249149  are  being  used. 

4-4-4  Nodal  Cumulative  Sampling  Results. 

4-4-4- 1  General  Observations.  Data  was  collected  on  10  FPGAs,  for 
64  bits  per  FPGA.  Analysis  of  the  data  shows  that  the  majority  of  multiplier  outputs 
produce  consistent  transition  counts  over  100+  runs.  The  number  of  transitions  on 
a  particular  output  varies  between  different  FPGAs,  typically  by  a  value  of  1.  In 
comparison  with  ModelSim  predictions,  the  number  of  transitions  are  about  half  the 
predicted  number.  This  is  most  likely  due  to  the  speed  and  width  of  the  individual 
transition  and  the  shift  register  setup  and  hold  times.  Bits  0-5  and  58-63  match  the 
predicted  values,  most  likely  due  to  the  fewer  number  of  LUTs  on  these  outputs, 
causing  fewer  transitions  and  allowing  them  to  hold  a  1  value  longer  so  that  they  can 
be  read  by  the  shift  registers.  Bits  16-47  show  the  most  activity  with  transition  counts 
ranging  between  4  to  6.  Some  multiplier  outputs  do  not  produce  a  constant  number 
of  transitions.  These  oscillating  output  values  show  constant  transition  counts  on 
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Figure  4.7:  4-bit  combinational  multiplier  array  showing  propagation  of  sum  and 
cout  for  15  *  5  which  are  applied  to  the  partial  product  AND  gates  on  each  adder.  Red 
values  show  adder  transitions  due  to  delays. 

other  FPGAs.  There  appears  to  be  no  discernible  pattern  for  which  outputs  show  the 
oscillation. 


4-4-4-%  Waveform  Analysis.  Examination  of  the  shift  register  wave¬ 
forms  show  several  different  phenomena  that  effect  the  final  transition  count.  Some 
waveforms  show  a  missed  transition  count  as  a  shift  register  output  does  not  show 
any  activity,  yet  the  next  register  reads  ‘1’  when  clocked.  This  is  due  to  sampling 
error  by  the  logic  analyzer  due  to  the  missing  transition  happening  faster  than  the 
analyzer  can  sample.  At  other  times  the  transition  disappears.  A  shift  register  makes 
a  transition  from  1  — >  0  showing  the  ‘1’  has  shifted,  but  the  next  register  does  not 
show  the  value.  This  can  be  attributed  to  a  setup/hold  time  violation  on  the  D  input 
of  the  register.  Figures  4.8(a)  and  4.8(b)  show  these  missed  transitions.  Similarly, 
the  time  multiple  shift  registers  output  constant  ‘l’s,  Figure  4.8(c).  This  is  also  due 
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to  setup/hold  time  violations,  in  this  case  for  the  clock  line,  therefore  the  lower  ‘1’ 
loaded  register  does  not  know  to  read  its  input  value. 

4-4 -4 -3  Nodal  Cumulative  Sampling  Digital  Fingerprint.  As  the  num¬ 
ber  of  transitions  on  an  output  change  between  FPGAs,  these  values  can  be  used 
to  create  a  digital  fingerprint.  Transition  counts  range  from  0-6  suggesting  a  base-8 
fingerprint.  For  a  32-bit  multiplier  output  (64  bits),  this  then  gives  the  theoretical 
maximum  of  unique  fingerprints  as: 


8 64  «  6.28  *  1054 

Examination  of  the  data  shows  that  the  actual  number  is  less  due  to  two  con¬ 
straints.  First,  a  number  of  outputs,  mainly  focused  in  outputs  0-5  and  59-63,  report 
the  same  value  across  all  FPGAs.  This  is  due  to  a  lack  of  transitions  on  these  signal 
lines,  as  the  number  of  LUTs  for  feeding  these  outputs  is  lower  than  for  other  outputs 
due  to  the  staggered  nature  of  the  ripple  adders. 

The  second  constraint  is  the  number  of  outputs  that  show  some  oscillation  of 
their  transition  count  values.  Depending  on  the  severity  of  the  oscillation,  these 
outputs  could  still  be  used  but  a  error  correction  factor  for  the  fingerprints  would 
have  to  be  created  to  account  for  this  activity.  As  an  example  of  the  effect  of  these 
constraints,  Table  4.1  shows  outputs  0-15  for  all  ten  FPGAs.  Only  five  of  the  outputs 
meet  the  constraints.  Output  12,  and  others  like  it  that  show  no  change,  may  also 
meet  the  constraints  with  testing  of  additional  FPGAs. 

Applying  these  two  constraints  to  the  entire  multiplier  output  results  in  18 
outputs  that  can  be  used  to  create  a  digital  fingerprint.  Table  4.2  shows  the  resulting 
fingerprints  for  the  ten  FPGAs.  Each  is  unique  and  with  using  only  18  values,  818  = 
18,  014,  398,  509, 481,  984  FPGAs  can  be  identified. 

Because  implementation  of  this  digital  fingerprint  will  be  done  on  a  binary 
system,  conversion  from  base-8  to  binary  results  in  a  three-fold  increase  in  the  length 
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Figure  4.8:  Transition  count  waveforms  with  sampling  errors. 


Table  4.1:  Transition  counts  for  outputs  0-15. 
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Table  4.2: 

Digital  Fingerprints  for  the  nodal  cumulative  sampling  circuit  FPGAs 
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of  the  ID  without  altering  the  number  of  possible  IDs.  This  allows  18  bits  of  the 
multiplier  output  to  become  a  54-bit  ID  or  through  restructuring  of  the  ripple  adders, 
the  entire  64-bit  output  becomes  a  192-bit  value. 

Regardless  of  which  implementation  is  used,  all  ten  FPGAs  can  be  distinctly 
identified.  The  full  64-octal  ID  for  each  FPGA  can  be  found  in  Appendix  A. 
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Figure  4.9:  Structure  of  the  transitional  sampling  circuit. 


4-5  Tranistional  Sampling 

This  circuit  involves  the  capture  of  the  multiplier  output  at  pseudo-random 
times  using  a  multiplier  output  as  a  pseudo-clock  signal  for  the  creation  of  a  digital 
fingerprint. 


4-5.1  Structure.  The  structure  of  this  circuit  is  similar  to  the  NCS  circuit. 
An  LFSR  is  attached  to  the  input  of  the  multiplier,  while  a  shift  register  is  placed 
on  the  output,  as  shown  in  Figure  4.9.  In  this  case,  the  shift  register  is  only  4  bits 
wide  so  as  to  capture  only  four  samples  of  the  output  (n0  to  n3).  Of  the  multiplier 
outputs,  nine  were  chosen  to  serve  as  the  shift  registers’  clock  input  based  on  consistent 
transition  counts.  These  were  determined  based  on  results  from  the  first  five  FPGAs 
tested  by  the  NCS  circuit.  Data  collection  on  the  second  five  FPGAs  was  performed 
simultaneously  with  this  circuit.  The  chosen  signals  outputs  are  pins:  18,  19,  21,  24, 
25,  28,  30,  34,  and  40. 

With  four  shift  registers  per  output,  64  pins  are  required  to  read  16  output  bits. 
Due  to  the  limitations  regarding  the  number  of  pins  available  on  the  FPGA  board  and 
the  number  of  fly-leads  available  for  the  logic  analyzer,  collection  of  the  data  from  the 
shift  registers  is  broken  into  four  16-bit  sections:  bits  0-15,  16-31,  32-47,  and  48-63. 
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Table  4.3: 


commonly  appearing  values 


Output  and  clock  errors  for  bits  47-32  of  FPGA  5,  Clk  25. 

,  and  output  error. 


Output 


1 

2 
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n0 


Samples 
ni  n2 


nz 


0x9258 

0x9258 

0x9208 


0x9055  0x1007  0x3a55 


0x1007 


4-5.2  ModelSim  Results.  ModelSim  confirms  the  functionality  of  the  tran¬ 
sitional  sampling  circuit.  Extrapolation  from  the  results  shows  that  changes  in  the 
transitions  on  the  shift  register  clock  input  could  result  in  radically  different  sample 
results. 


4-5.3  Transitional  Sampling  Results. 

4-5.3. 1  General  Observations.  Testing  of  the  transitional  sampling 
circuit  was  done  on  twenty  FGPAs,  utilizing  nine  multiplier  outputs  serving  as  trigger 
lines  to  capture  four  samples  per  output.  Analysis  shows  that  the  sample  values  are 
fairly  consistent  over  multiple  data  capture  runs  for  all  FPGAs  and  output  trigger 
signals.  For  a  number  of  the  output  samples,  variations  in  the  data  stored  in  the 
shift  registers  exist.  These  are  caused  by  two  different  types  of  errors.  The  first  is 
due  either  to  outputs  being  on  the  edge  of  readability  by  the  shift  register  or  clock 
signal  skew  causing  the  capture  to  be  off.  This  causes  only  a  few  bits  to  change  over 
multiple  runs.  The  second  is  due  to  the  clock  signal  being  read  inconsistently  by  the 
shift  registers  resulting  in  a  large  number  of  bits  changing  over  multiple  runs.  Both 
of  these  errors  are  due  to  the  same  underlying  cause,  namely  the  transition  being  on 
the  edge  of  readability  by  the  shift  registers.  Table  4.3  shows  the  samples  of  a  single 
board  with  both  output  and  clock  based  errors. 

4 .5. 3. 2  Transitional  Sampling  Digital  Fingerprint.  In  order  validate 
this  digital  fingerprint  method,  all  twenty  FPGAs  must  be  differentiable  across  a 
single  sample.  This  is  done  my  performing  a  bitwise  XOR  between  the  sample  values 
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Table  4.4:  FPGA  16,  Clk  25  w/  error  factors 


Output 

n0 

Samples 

Tii  n2 
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Errors 

2 

5 

13 

15 

across  all  the  FPGAs  for  a  particular  clock  signal.  Any  differences  will  result  in  a 
‘P  value  for  that  bit.  Since  errors  are  being  seen  in  the  samples,  any  differentiation 
between  FPGAs  must  also  take  into  account  errors  when  the  sample  value  is  read.  To 
do  this,  we  create  an  error  factor  for  each  sample  of  each  FPGA.  This  factor  is  the 
maximum  number  of  bits  that  a  sample  is  seen  to  change  from  its  normal  output,  i.e. 
the  one  that  shows  up  the  most  often.  An  output  error  or  a  clock  error  value  is  used 
as  the  error  factor  as  they  are  mutually  exclusive.  In  the  case  of  no  clear,  consistent 
output,  one  was  chosen  at  random  from  the  multiple  runs  of  data.  Table  4.4  adds  the 
error  factors  for  the  samples  of  the  FPGA  given  in  Table  4.3. 

Table  4.5  summarizes  the  results  showing  FPGA  differentiation  for  each  clock 
signal  and  sample  with  full  breakdown  by  bit  in  Appendix  B.  Also  shown  is  the 
breakdown  for  the  number  of  bits  that  varied  and  stayed  the  same  between  FPGAs 
for  that  clock  along  with  the  number  of  bits  that  had  an  error  for  at  least  one  FPGA 
for  that  sample  and  clock. 

Examination  of  these  results  shows  that  on  average  the  best  differentiation  is 
achieved  with  sample  n2  despite  the  number  of  errors,  with  clock  24  correctly  identi¬ 
fying  all  20  FPGAs.  The  erratic  differentiation  values  across  all  samples  and  clocks 
come  from  which  bits  have  error,  see  Appendix  B  for  the  actual  bit  vectors  and  error 
locations.  If  the  majority  of  error  is  on  bits  that  should  be  the  same  between  FPGAs, 
there  is  little  effect  on  the  differentiation.  Based  on  the  data,  these  cases  happen 
when  there  are  to  few  bits  to  differentiate  FPGAs,  thus  the  result  is  the  same.  When 
the  error  effects  bits  that  are  different  between  FPGAs,  naturally  the  differentiation 
is  greatly  reduced. 
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Table  4.5:  FPGA  differentiation  results  by  sample 


(a)  n0 


Clk 

^Differentiated 
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The  creation  of  a  digital  fingerprint  is  dependent  upon  being  able  to  have  a 
repeatable  value.  The  transitional  sampling  circuit  is  hard  to  control  and  requires 
a  good  clock  signal  that  provides  transitions  that  meet  the  shift  register’s  flip-flops’ 
setup  and  hold  times.  This  circuit  may  be  more  applicable  in  conjunction  with  nodal 
cumulative  sampling,  by  providing  an  asynchronous  read  of  the  transition  count  value. 


J^.5.3.3  Digital  Fingerprinting  Circuits  Results  vs  Total  Population. 


With  both  digital  fingerprinting  circuits,  all  FPGAs  can  be  identified,  10/10  for  the 
glitch  count  and  20/20  for  the  asynchronous  capture.  These  numbers  represent  a 
fairly  small  number  compared  to  the  total  population  of  Virtex-II  Pro  FPGAs  cur¬ 
rently  available.  In  order  to  relate  this  success  to  the  larger  population,  use  of  the 
hypergeometric  distribution  is  required. 

The  hypergeometric  distribution  gives  the  probability  that  the  total  number  of 
successes  in  a  sample  are  representative  of  a  larger  population.  To  use  this  analogy, 
take  a  box  filled  with  two  colors  of  balls.  A  sample  of  the  ball  population  shows 
that  x%  of  the  balls  are  of  one  color.  Is  that  percentage  representative  of  the  whole 
population?  Equation  (4.1)  gives  the  hypergeometric  distribution,  where  Np  is  the 
population  size,  k  is  the  number  of  items  with  the  desired  characteristic,  Ns  is  the 
total  number  of  samples  drawn,  and  x  is  the  number  of  samples  that  have  the  desired 
characteristic. 


(4.1) 


Performing  some  mathematical  manipulation,  we  can  get  the  distribution  in 
terms  of  percentages  by  extracting  the  population  and  sample  sizes,  given  by  Equa¬ 
tion  (4.2),  with  the  following  variables: 


(4.2) 
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Np  Total  population 

C  Unknown  %  of  population  with  desired  characteristic 
with  range,  0  <  x  <  1 

CNp  Total  #  in  population  with  desired  characteristic 
Ns  Sample  size 

x  %  of  samples  with  desired  characteristic,  with  range,  0  <  x  <  1 
xNs  Total  #  in  sample  with  desired  characteristic 

Using  the  assumption  from  Chapter  2  that  there  is  a  statistically  significant 
variation  in  all  FPGAs  and  therefore  the  entire  population  is  different,  then  C  —  1. 
As  x  is  determined  from  the  sample,  we  want  to  know  its  validity  for  the  entire 
population,  given  the  sample  size.  This  is  done  by  using  the  standard  deviation  of 
the  sample,  a.  To  have  high  confidence  in  the  result,  i.e.  0.997  probability  that  the 
resulting  values  are  correct,  three  times  the  standard  deviation  (3er)  is  used.  The 
result,  shown  in  Equation  (4.3),  is  a  quadratic  equation  whose  roots,  when  solved  for 
C,  give  the  upper  and  lower  bounds  for  the  probability  that  the  measured  x  value 
from  the  sample  is  correct.  The  closer  the  roots  are  to  1,  the  more  accurately  our 
differentiation  method  works  for  all  cases.  Note  that  A  is  the  confidence  level  of  the 
result  and  A  =  3  for  3cr. 


(x  -  Cf  =  A2  AU-U  (4.3) 

For  the  NCS  circuit,  C  is  solved  when  x  —  10  and  Ns  =  10  giving  upper 
and  lower  bounds  of  1.0000  and  0.5263,  respectively.  As  the  worse  case  needs  to  be 
always  considered,  therefore,  there  is  a  52.63%  probability  that  the  glitch  counting 
method  in  its  current  configuration  would  be  able  to  correctly  differentiate  between  all 
Virtex-II  Pro  FPGAs.  Solving  for  the  transitional  sampling  circuit  with  x  =  20  and 
Ns  =  20  gives  a  68.97%  correct  differentiation.  As  the  hypergeometric  distribution 
is  significantly  dependent  on  the  sample  size,  the  small  sample  sizes  used  in  this 


44 


testing  can  not  provide  high  confidence  results.  To  achieve  90%,  95%,  and  99% 
confidence,  sample  sizes  of  81,  171,  and  887  FPGAs  respectively,  assuming  complete 
differentiation,  would  have  to  be  used.  The  over  50%  confidence  for  the  two  digital 
fingerprint  circuits  shows  they  provide  better  results  than  just  guessing. 
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V.  Conclusions  and  Future  Work 


This  chapter  contains  conclusions  to  this  research  effort  and  their  significance. 
Future  research  recommendations  are  also  included  for  continued  efforts. 

5.1  Conclusions  of  Research 

With  the  increase  of  COTS  usage  in  military  applications,  the  probability  that 
critical  functionality  will  be  reverse  engineered  or  tampered  with  becomes  high.  The 
creation  of  a  digital  fingerprint  that  can  provide  a  unique  identifier  that  can  both 
identify  a  circuit  and  potentially  grant  functionality  lock-in  to  a  specific  piece  of 
silicon  becomes  a  critical  necessity  in  securing  the  United  States’  technological  edge. 

This  research  proposed  that  variations  in  semiconductor  fabrication  have  a  mea¬ 
surable  effect  and  can  be  used  in  conjunction  with  circuit  functionality  to  create  a 
digital  fingerprint.  Three  methods  of  creating  this  digital  fingerprint  were  proposed, 
asynchronous  LFSR,  nodal  cumulative  sampling,  and  transitional  sampling,  of  which 
the  latter  two  were  able  to  be  implemented  on  an  FPGA. 

Test  circuit  functionality  was  implemented  on  the  FPGA  to  gain  access  to  the 
LUTs  directly  in  order  to  properly  test  the  digital  fingerprint  methods.  The  resulting 
data  shows  variations  in  the  number  of  transitions  of  the  output  bits  across  multiple 
FPGAs.  The  variations  are  for  the  most  part  consistent  and  provide  a  good  foun¬ 
dation  to  allow  the  creation  of  a  digital  fingerprint.  The  two  fingerprint  methods 
implemented,  using  the  number  of  transitions  directly  as  a  fingerprint  and  using  the 
transitions  to  sample  signals  asynchronously,  both  resulted  in  successful  identification 
of  all  FPGAs. 

5.2  Future  Research 

There  are  a  number  of  different  follow-on  research  topics  that  should  be  inves¬ 
tigated  detailed  below,  several  of  which  may  be  thesis  topics  in  themselves. 
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5.2.1  Other  Register  Types.  Due  to  the  limitations  of  the  FPGA  used,  only 
0  — >  1  transitions  were  examined.  Through  different  implementations  of  flip-flops, 
most  likely  ASIC,  1  — >  0  transitions,  1-level  and  0-level  pulses,  or  some  combination 
of  all  four  can  be  used  to  create  a  digital  fingerprint 

5.2.2  ASIC  Asynchronous  LFSR.  The  implementation  of  the  asynchronous 
LFSR  as  an  ASIC  design  need  to  be  done  to  verify  it’s  feasibility.  While  it  should 
work  in  theory,  the  restrictions  of  the  FPGA  architecture  do  not  allow  its  creation. 

5.2.3  LFSR  Based  Nodal  Cumulative  Sampling  and  Transitional  Sampling. 
Instead  of  using  the  internal  nodes  of  circuit  to  create  the  digital  fingerprint,  a  LFSR 
could  be  used  as  part  of  a  stand  alone  digital  fingerprint  circuit.  The  LFSR  is  capable 
of  generating  pseudo-random  outputs,  based  on  tap  position,  seed  value,  and  number 
of  clock  cycles  operated.  Attaching  the  LFSR  to  another  fingerprint  generating  cir¬ 
cuit,  like  the  NCS  or  transitional  sampling,  would  result  in  a  much  more  scrambled 
fingerprint.  One  step  further  would  be  to  use  the  proposed  asynchronous  LFSR. 

5.2.4  Digital  Fingerprint  Encryption/ Decryption  Key.  Because  of  the  unique¬ 

ness  of  the  digital  fingerprint,  it  could  be  used  in  the  creation  of  an  encryption/de¬ 
cryption  key  set.  Data  would  have  to  undergo  these  processes  on  the  FPGA  as  to  keep 
the  keys  secret.  This  could  also  be  applied  to  the  FPGA  bitstream  itself,  allowing  it 
to  be  encrypted  in  such  a  way  that  only  one  FPGA  could  use  it. 

5.2.5  Stress  Testing.  An  in  depth  look  at  the  long  term  stability  of  digital 
fingerprints  proposed  in  this  thesis  must  be  done.  As  an  FPGA  or  ASIC’s  char¬ 
acteristics  can  change  over  time,  the  effect  this  has  on  the  fingerprint  needs  to  be 
examined. 
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Appendix  A.  Nodal  Cumulative  Sampling  Outputs 
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Appendix  B.  Tranistional  Sampling  Outputs 
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Tabic  B.9:  FPGA  outputs  for  Sample  0,  Clk  40 
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Table  B.ll:  FPGA  outputs  for  Sample  1,  Clk  19 
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Table  B.13:  FPGA  outputs  for  Sample  1,  Clk  24 
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Table  B.15:  FPGA  outputs  for  Sample  1,  Clk  28 
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Table  B.17:  FPGA  outputs  for  Sample  1,  Clk  34 
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Table  B.19:  FPGA  outputs  for  Sample  2,  Clk  18 
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Table  B.21:  FPGA  outputs  for  Sample  2,  Clk  21 
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Table  B.23:  FPGA  outputs  for  Sample  2,  Clk  25 
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Table  B.25:  FPGA  outputs  for  Sample  2,  Clk  30 
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Table  B.29:  FPGA  outputs  for  Sample  3,  Clk  19 
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Table  B.31:  FPGA  outputs  for  Sample  3,  Clk  24 


64 


Table  B.33:  FPGA  outputs  for  Sample  3,  Clk  28 
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Table  B.35:  FPGA  outputs  for  Sample  3,  Clk  34 
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Appendix  C.  VHDL  Entities 

Listing  C.l:  Nodal  Cumulative  Sampling  Circuit  Entities 

ENTITY  glitch_capture_setup  IS 

PORT (set  :  IN  STD_LOGIC; 

reset  :  IN  STD_L0GIC; 
clock.lfsr  :  IN  STD_L0GIC; 
clock.reg  :  IN  STD_L0GIC; 
pin_out  :  OUT  STD_L0GIC; 

reg.values  :  OUT  STD_LOGIC_VECTOR ( 1 1  DOWNTO  0)); 

END  ENTITY; 

ENTITY  glitch._captu.re  IS 

GENERIC (mult_width  :  POSITIVE  :=4); 

PORT (x  :  IN  STD_LOGIC_VECTOR( mult  _ width  - 1  DOWNTO  0); 

y  :  IN  STD_LOGIC_VECTOR (mult _ width  - 1  DOWNTO  0); 
reset  :  IN  STD_LOGIC; 
pin_out  :  OUT  STD_LOGIC; 

output_vals  :  OUT  STD_LOGIC_VECTOR ( ( mult _width) -1  ... 
DOWNTO  0));  --vector  of  multiplier  outputs 

END  ENTITY; 

ENTITY  If  sr  IS 

GENERIC  (width  :  positive); 

PORT (  clock  :  in  std_logic; 

load  :  in  std_logic_vector (  width  -  1  DOWNTO  0); 
bit_out  :  out  std_logic; 

state.out  :  out  std_logic_vector  (width  -  1  DOWNTO... 

0))  ; 

END  ENTITY  lfsr; 

ENTITY  multip_cell  IS 

PORT  (  p_in  :  IN  std_logic; 

x  :  IN  std_logi c ; 
y  :  IN  std_logi c ; 
c.in  :  IN  std_logic; 
p_out  :  OUT  std_logic; 
c_out  :  OUT  std_logic); 

END  ENTITY  mult  ip _ c e 1 1 ; 

ENTITY  multiplier  IS 

GENERIC(  width  :  positive  :=4); 

P0RT(x  :  in  std_logic_vector (width  -1  downto  0); 

y  :  in  std_logic_vector (width  -1  downto  0); 

product  :  out  std_logic_vector ( (2*width)  -1  downto  0)); 

END  ENTITY; 

ENTITY  storage  IS 

GENERIC (size  :  POSITIVE  :=4); 

PORT (clock  :  IN  STD_L0GIC; 

reset  :  IN  STD_L0GIC;  --active  low 

input  :  IN  STD_LOGIC_VECTOR ( size -1  DOWNTO  0); 

output  : OUT  STD_LOGIC_VECTOR(size-l  DOWNTO  0)); 
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END  ENTITY  storage; 


Listing  C.2:  Transitional  Sampling  Circuit  Entities 


ENTITY  glitch_read_setup  I 
PORT (set  :  IN  STD_ 
reset  :  IN 
clock  :  IN 
read_bitsO 
read_bits 1 
read_bits2 
read_bits3 
read_bits4 
read_bits5 
read_bits6 
read_bits7 
read_bits8 
read_bits9 
read_bits  1 
read_bits 1 
read_bits  1 
read_bitsl 
read_bitsl 
read  bitsl 


S 

LOGIC; 
STD_L0GIC ; 
STD_L0GIC ; 
OUT  STD_ 
OUT  STD_ 
OUT  STD_ 
OUT  STD_ 
OUT  STD_ 
OUT  STD_ 
OUT  STD_ 
OUT  STD_ 
OUT  STD_ 
OUT  STD_ 
OUT  STD 
OUT  STD 
OUT  STD 
OUT  STD 
OUT  STD 
OUT  STD 


LOGIC 

LOGIC 

LOGIC 

LOGIC 

LOGIC 

LOGIC 

LOGIC 

LOGIC 

LOGIC 

LOGIC 

LOGI 

LOGI 

LOGI 

LOGI 

LOGI 

LOGI 


_ VECTOR ( 
_ VECTOR ( 
_ VECTOR ( 
_ VECTOR ( 
_ VECTOR ( 
_ VECTOR ( 
_ VECTOR ( 
_ VECTOR ( 
_ VECTOR ( 
_ VECTOR ( 
C_VECTOR 
C_VECTOR 
C_VECTOR 
C_VECTOR 
C_VECTOR 
C_VECTOR 


15  DOWNTO  0) 

15  DOWNTO  0) 

15  DOWNTO  0) 

15  DOWNTO  0) 

15  DOWNTO  0) 

15  DOWNTO  0) 

15  DOWNTO  0) 

15  DOWNTO  0) 

15  DOWNTO  0) 

15  DOWNTO  0) 

(15  DOWNTO  0) 
(15  DOWNTO  0) 
(15  DOWNTO  0) 
(15  DOWNTO  0) 
(15  DOWNTO  0) 
(15  DOWNTO  0) )  ; 


END  ENTITY; 


ENTITY  glitch.read  IS 

GENERIC (mult_width 
PORT (x 


POSITIVE  :  =4)  ; 

IN  STD_LOGIC_VECTOR( mult  _ width  - 1  DOWNTO  0); 
y  :  IN  STD_LOGIC_VECTOR (mult _ width  - 1  DOWNTO  0); 


END  ENTITY; 


ENTITY  If  sr  IS 


PORT  ( 


et  :  IN 

STD 

.LOGIC; 

pleO  : 

OUT 

STD_LOGIC_VECTOR ( ( 2 *  mult _ w i dt h ) 

-1 

DOWNTO 

0)  ; 

plel  : 

OUT 

STD_LOGIC_VECTOR ( ( 2 *  mult _ w i dt h ) 

-1 

DOWNTO 

0)  ; 

ple2  : 

OUT 

STD_LOGIC_VECTOR ((2*mult_width) 

-1 

DOWNTO 

0)  ; 

ple3  : 

OUT 

STD_LOGIC_VECTOR ( ( 2 *  mult _ w i dt h ) 

-1 

DOWNTO 

0))  ; 

dth  :  positive) ; 

ck  :  in 

std 

-logic ; 

d  :  in 

std_ 

logic.vector (  width  -  1  DOWNTO 

0)  ; 

_out  : 

out 

std.logic ; 

state.out  :  out  std_logic_vector  (width  -  1  DOWNTO. 
0))  ; 


END  ENTITY  lfsr; 

ENTITY  multip_cell  IS 

PORT  (  p_in  :  IN  std_logic; 

x  :  IN  std_logic ; 
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y 

:  I 

N 

std_logic ; 

c_ 

in 

IN  std_log: 

Lc  ; 

P- 

out 

OUT 

std. 

-logic ; 

c_ 

out 

OUT 

std. 

.logic) 

END  ENTITY  mult 

iP- 

c  e 

i — 1 

1 — 1 

ENTITY  multipli 

er 

IS 

GENERIC (  wi 

dth 

positive 

:=4)  ; 

P0RT(x  :  in  std_logic_vector (width  -1  downto  0) ; 

y  :  in  std_logic_vector (width  -1  downto  0 
product  :  out  std_logic_vector ( (2*width) 

END  ENTITY; 


ENTITY  shif ter_sample  IS 

PORT (reset  :  IN  STD_L0GIC; 
set  :  IN  STD_L0GIC ; 

clock  :  IN  STD_L0GIC ; 
d 


:  IN 

STD_L0GIC 

i 

_out 

:  OUT 

STD 

_L0G 

_out 

:  OUT 

STD 

_L0G 

_out 

:  OUT 

STD 

_L0G 

_out 

:  OUT 

STD 

_L0G 

END  ENTITY; 


1  downto  0) ) 
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